clusterProfiler: Bioconductor, Paper, Documentation
enrichplot: Bioconductor
Demo Dataset: E-MTAB-8411 from The clock gene Bmal1 inhibits macrophage motility, phagocytosis, and impairs defense against pneumonia. PNAS. 2020;117(3):1543-1551.
License: GPL-3.0
Rcd /ngs/GO-Enrichment-Analysis-Demo
R
If you have downloaded the DESeq2_DEG.txt file with wget:
If you like to donwload the file in R now:
data <- data.table::fread("https://raw.githubusercontent.com/ycl6/GO-Enrichment-Analysis-Demo/master/DESeq2_DEG.txt")
data$GeneID <- substr(data$GeneID, 1, 18)## GeneID GeneSymbol log2fc pvalue padj
## 1: ENSMUSG00000000001 Gnai3 0.08804493 0.1925609 0.6146732
## 2: ENSMUSG00000000003 Pbsn NA NA NA
## 3: ENSMUSG00000000028 Cdc45 -0.32106635 0.1401127 0.5331437
## 4: ENSMUSG00000000031 H19 -1.20339889 0.7161464 NA
## 5: ENSMUSG00000000037 Scml2 -0.57746426 0.2979159 NA
## ---
## 55381: ENSMUSG00000118636 AC117663.3 NA NA NA
## 55382: ENSMUSG00000118637 AL772212.1 NA NA NA
## 55383: ENSMUSG00000118638 AL805980.1 NA NA NA
## 55384: ENSMUSG00000118639 AL590997.4 NA NA NA
## 55385: ENSMUSG00000118640 AC167036.2 0.06415390 0.9208414 NA
up.idx <- which(data$padj < 0.05 & data$log2fc > 0) # FDR < 0.05 and logFC > 0
dn.idx <- which(data$padj < 0.05 & data$log2fc < 0) # FDR < 0.05 and logFC < 0## [1] 55385 5
## [1] 383
## [1] 429
all.genes <- data$GeneSymbol
up.genes <- data[up.idx,]$GeneSymbol
dn.genes <- data[dn.idx,]$GeneSymbol## [1] "Axin2" "Hnrnpd" "Kcnn3" "Mapk7" "Agpat3" "Sema6b" "Efnb2" "Il16" "Ltbp1"
## [10] "Rgs19"
## [1] "Cox5a" "Pdgfb" "Itga5" "Cd52" "Dnmt3l" "Tubb6" "Ell2" "Ifrd1" "Stk38l"
## [10] "Ubl3"
Alternatively, if you only have Ensembl gene ID
## [1] "clusterProfiler_GO-BP_ORA_simplify"
We use all the annotated genes (all.genes) as the “gene universe” and the differentially expressed genes (up.genes and dn.genes) as our genes of interest.
We would need to convert any other identifier format to ENTREZID which is the required input identifier format. This can be done by using the select function from AnnotationDbi that we saw in Part 1 of this demo, or by using the “Biological Id TRanslator” bitr function from clusterProfiler which is a wrapper function of AnnotationDbi::select.
Here, we will use bitr here to see how this can be done.
# Use fromType = "ENSEMBL" if your input identifier is Ensembl gene ID
all.genes.df = bitr(all.genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Mm.eg.db")## 'select()' returned 1:1 mapping between keys and columns
## SYMBOL ENTREZID
## 1 Gnai3 14679
## 2 Pbsn 54192
## 3 Cdc45 12544
## 4 H19 14955
## 5 Scml2 107815
## 6 Apoh 11818
## 7 Narf 67608
## 8 Cav2 12390
## 9 Klf6 23849
## 10 Scmh1 29871
## 'select()' returned 1:1 mapping between keys and columns
## SYMBOL ENTREZID
## 1 Axin2 12006
## 2 Hnrnpd 11991
## 3 Kcnn3 140493
## 4 Mapk7 23939
## 5 Agpat3 28169
## 6 Sema6b 20359
## 7 Efnb2 13642
## 8 Il16 16170
## 9 Ltbp1 268977
## 10 Rgs19 56470
## 'select()' returned 1:1 mapping between keys and columns
## SYMBOL ENTREZID
## 1 Cox5a 12858
## 2 Pdgfb 18591
## 3 Itga5 16402
## 4 Cd52 23833
## 5 Dnmt3l 54427
## 6 Tubb6 67951
## 7 Ell2 192657
## 8 Ifrd1 15982
## 9 Stk38l 232533
## 10 Ubl3 24109
pvalueCutoff on unadjusted p-values, (2) pvalueCutoff on adjusted p-values and (3) qvalueCutoff on q-values.upEGO = enrichGO(gene = up.genes.df$ENTREZID, universe = all.genes.df$ENTREZID,
OrgDb = "org.Mm.eg.db", ont = ontology, pvalueCutoff = 0.05,
pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)
upEGO## #
## # over-representation test
## #
## #...@organism Mus musculus
## #...@ontology BP
## #...@keytype ENTREZID
## #...@gene chr [1:373] "12006" "11991" "140493" "23939" "28169" "20359" "13642" "16170" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05
## #...34 enriched terms found
## 'data.frame': 34 obs. of 9 variables:
## $ ID : chr "GO:0050900" "GO:0002685" "GO:0097529" "GO:0060326" ...
## $ Description: chr "leukocyte migration" "regulation of leukocyte migration" "myeloid leukocyte migration" "cell chemotaxis" ...
## $ GeneRatio : chr "23/345" "14/345" "14/345" "17/345" ...
## $ BgRatio : chr "338/22070" "200/22070" "205/22070" "297/22070" ...
## $ pvalue : num 0.00000000425 0.00000344608 0.00000459267 0.00000491872 0.00000679209 ...
## $ p.adjust : num 0.0000157 0.0045424 0.0045424 0.0045424 0.005018 ...
## $ qvalue : num 0.0000136 0.0039181 0.0039181 0.0039181 0.0043284 ...
## $ geneID : chr "Il16/Lbp/Mmp9/Lgmn/Cd200r1/Itgb1/Selp/Itga4/Itga6/Bst1/Coro1a/Itgam/Vav3/Ripor2/Gcnt1/Itga9/Ptger4/Tnfrsf14/Mtu"| __truncated__ "Lbp/Mmp9/Lgmn/Cd200r1/Selp/Itga4/Bst1/Ripor2/Ptger4/Tnfrsf14/Mtus1/Nod2/Ano6/C5ar2" "Lbp/Lgmn/Cd200r1/Bst1/Itgam/Vav3/Ripor2/Itga9/Ptger4/Mtus1/Nod2/Prtn3/Ano6/C5ar2" "Il16/Lbp/Elmo2/Lgmn/Bst1/Coro1a/Itgam/Fgfr1/Bcar1/Vav3/Dock4/Ripor2/Itga9/Mtus1/Nod2/Ano6/C5ar2" ...
## $ Count : int 23 14 14 17 8 22 15 10 13 16 ...
## #...Citation
## Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
## clusterProfiler: an R package for comparing biological themes among
## gene clusters. OMICS: A Journal of Integrative Biology
## 2012, 16(5):284-287
dnEGO = enrichGO(gene = dn.genes.df$ENTREZID, universe = all.genes.df$ENTREZID,
OrgDb = "org.Mm.eg.db", ont = ontology, pvalueCutoff = 0.05,
pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)
dnEGO## #
## # over-representation test
## #
## #...@organism Mus musculus
## #...@ontology BP
## #...@keytype ENTREZID
## #...@gene chr [1:421] "12858" "18591" "16402" "23833" "54427" "67951" "192657" "15982" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05
## #...211 enriched terms found
## 'data.frame': 211 obs. of 9 variables:
## $ ID : chr "GO:0002224" "GO:0002221" "GO:0002237" "GO:0042060" ...
## $ Description: chr "toll-like receptor signaling pathway" "pattern recognition receptor signaling pathway" "response to molecule of bacterial origin" "wound healing" ...
## $ GeneRatio : chr "15/389" "15/389" "24/389" "24/389" ...
## $ BgRatio : chr "103/22070" "137/22070" "358/22070" "360/22070" ...
## $ pvalue : num 0.000000000376 0.000000020742 0.000000026746 0.000000029737 0.000000059157 ...
## $ p.adjust : num 0.00000147 0.00002902 0.00002902 0.00002902 0.00004619 ...
## $ qvalue : num 0.00000119 0.00002365 0.00002365 0.00002365 0.00003764 ...
## $ geneID : chr "Nr1h3/Cyba/Ptprs/Mapkapk2/Tnip1/Nr1d1/Traf3/Acod1/Tlr2/Lrrfip2/Tnip3/Tlr1/Ticam1/Rab7b/Irak2" "Nr1h3/Cyba/Ptprs/Mapkapk2/Tnip1/Nr1d1/Traf3/Acod1/Tlr2/Lrrfip2/Tnip3/Tlr1/Ticam1/Rab7b/Irak2" "Nr1h3/Axl/Mapkapk2/Cxcl16/Nr1d1/Acod1/Trem2/Slc11a1/Il1rn/Tlr2/Nfkb1/Tnfrsf1b/Irf5/Nfkbib/Malt1/Serpine1/Ptgir/"| __truncated__ "Pdgfb/Ifrd1/Vwf/Axl/Elk3/Nfe2l2/Gna13/Evl/Hmgcr/Plec/Cdkn1a/Akirin1/Tnfrsf12a/Papss2/Pdgfa/Slc11a1/Procr/Slc7a1"| __truncated__ ...
## $ Count : int 15 15 24 24 28 20 24 22 24 26 ...
## #...Citation
## Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
## clusterProfiler: an R package for comparing biological themes among
## gene clusters. OMICS: A Journal of Integrative Biology
## 2012, 16(5):284-287
We then use the simplify function to reduce redundancy of enriched GO terms. We will used the default parameters to run the function.
upSimGO = simplify(upEGO, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang",
semData = NULL)
nrow(upSimGO)## [1] 20
dnSimGO = simplify(dnEGO, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang",
semData = NULL)
nrow(dnSimGO)## [1] 98
We will use the barplot function from enrichplot (previously a function in clusterProfiler)
png(paste0(outTitle, "_up.png"), width = 9, height = 6, units = "in", res = 300)
barplot(upSimGO, showCategory = 20) +
ggtitle(paste0("GO-", ontology," ORA of up-regulated genes")) +
xlab("Enriched terms") + ylab("Count")
invisible(dev.off())“clusterProfiler_GO-BP_ORA_simplify_up.png”
png(paste0(outTitle, "_dn.png"), width = 9, height = 6, units = "in", res = 300)
barplot(dnSimGO, showCategory = 20) +
ggtitle(paste0("GO-", ontology," ORA of down-regulated genes")) +
xlab("Enriched terms") + ylab("Count")
invisible(dev.off())“clusterProfiler_GO-BP_ORA_simplify_dn.png”
Save enrichGO results as a data.frame
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/r4/lib/libopenblasp-r0.3.10.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
## [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets methods
## [9] base
##
## other attached packages:
## [1] ggplot2_3.3.2 org.Mm.eg.db_3.11.4 AnnotationDbi_1.50.3
## [4] IRanges_2.22.2 S4Vectors_0.26.1 Biobase_2.48.0
## [7] BiocGenerics_0.34.0 enrichplot_1.8.1 clusterProfiler_3.16.0
## [10] knitr_1.29
##
## loaded via a namespace (and not attached):
## [1] bit64_4.0.2 RColorBrewer_1.1-2 progress_1.2.2 httr_1.4.2
## [5] tools_4.0.2 R6_2.4.1 DBI_1.1.0 colorspace_1.4-1
## [9] withr_2.2.0 tidyselect_1.1.0 gridExtra_2.3 prettyunits_1.1.1
## [13] bit_4.0.4 compiler_4.0.2 scatterpie_0.1.4 xml2_1.3.2
## [17] labeling_0.3 triebeard_0.3.0 scales_1.1.1 ggridges_0.5.2
## [21] stringr_1.4.0 digest_0.6.25 rmarkdown_2.3 DOSE_3.14.0
## [25] pkgconfig_2.0.3 htmltools_0.5.0 highr_0.8 rlang_0.4.7
## [29] RSQLite_2.2.0 gridGraphics_0.5-0 farver_2.0.3 generics_0.0.2
## [33] jsonlite_1.7.0 BiocParallel_1.22.0 GOSemSim_2.14.1 dplyr_1.0.1
## [37] magrittr_1.5 ggplotify_0.0.5 GO.db_3.11.4 Matrix_1.2-18
## [41] Rcpp_1.0.5 munsell_0.5.0 viridis_0.5.1 lifecycle_0.2.0
## [45] stringi_1.4.6 yaml_2.2.1 ggraph_2.0.3 MASS_7.3-51.6
## [49] plyr_1.8.6 qvalue_2.20.0 grid_4.0.2 blob_1.2.1
## [53] ggrepel_0.8.2 DO.db_2.9 crayon_1.3.4 lattice_0.20-41
## [57] graphlayouts_0.7.0 cowplot_1.0.0 splines_4.0.2 hms_0.5.3
## [61] pillar_1.4.6 fgsea_1.14.0 igraph_1.2.5 reshape2_1.4.4
## [65] fastmatch_1.1-0 glue_1.4.1 evaluate_0.14 downloader_0.4
## [69] data.table_1.13.0 BiocManager_1.30.10 png_0.1-7 vctrs_0.3.2
## [73] tweenr_1.0.1 urltools_1.7.3 gtable_0.3.0 purrr_0.3.4
## [77] polyclip_1.10-0 tidyr_1.1.1 xfun_0.16 ggforce_0.3.2
## [81] europepmc_0.4 tidygraph_1.2.0 viridisLite_0.3.0 tibble_3.0.3
## [85] rvcheck_0.1.8 memoise_1.1.0 ellipsis_0.3.1